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1.  INTRODUCTION 


The  emergence  of  massively  parallel  computers,  such  as  the  present  generation  of 
hypercube  machines,  is  having  a  significant  influence  on  the  development  and  implementation 
of  computational  models  for  describing  physical  phenomena.  A  pressing  concern  in  the 
construction  of  parailet  applications  is  the  mapping  of  algorithms  onto  scalable  multiprocessors 
which  can  be  scaled  to  the  teraflop  (10^^  floating  point  operations  per  second)  performance 
range. 

An  important  class  of  problems  where  the  principal  limitation  is  CPU  performance  is  the 
large-scale  numerical  solution  of  partial  differential  equations  applied  to  shock  physics 
modeling  in  two  and  three  dimensions.  The  successful  utilization  of  parallel  computers  for 
these  problems  requires  the  adaptation  of  existing  sequential  algorithms  into  reliable  and 
robust  parallel  algorithms. 

This  report  presents  a  brief  overview  of  the  parallel  algorithms  and  data  structures  for 
implementing  a  two-dimensional  (2-D),  multimaterial  kernel  of  the  wave-propagation  code, 
HULL,  on  a  Connection  Machine  (CM).  Computational  performance  is  illustrated  for  a 
prototypical  rod-plate  impact  problem.  Particular  detail  is  given  to  computational  methodology, 
performance  characteristics,  and  algorithm  scalability.  Complementary  parallel  computing 
efforts  for  recently  developed  wave-propagation  codes  are  being  conducted  by  Sandia 
National  Laboratories  (Robinson  et  al.  1990)  and  Los  Alamos  National  Laboratory  (Hopson 
1990). 

2.  THE  CONNECTION  MACHINE 

The  Connection  Machine  CM-2  (Thinking  Machines  Corporation  1990)  is  a  massively 
data-parallel  computer  configured  with  a  maximum  of  64K  (2^®)  bit-serial  processors 
interconnected  in  a  boolean  hypercube  topology.  Each  processor  is  equipped  with  128 
Kbytes  of  memory  giving  a  total  memory  capacity  of  8  Gbytes.  The  processors  are  arranged 
in  hardware  with  1 6  processors  to  a  chip,  and  each  pair  of  chips  (referred  to  as  a  node)  share 
a  Weitek  floating-point  accelerator  each  having  64-bit  precision  arithmetic. 
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Floating  point  computations  on  the  CM-2  are  implemented  via  two  models:  fieldwise  and 
slicewise.  In  the  fieldwise  model,  the  atomic  unit  is  the  processing  element  and  the  storage  of 
a  32-bit  word  is  allocated  in  32  sequential  bits  of  a  physical  processor’s  memory.  In  the 
slicewise  model,  the  atomic  unit  is  the  processing  node  and  a  word  is  stored  in  a  32-bit  slice 
across  the  memories  of  the  32  processors  in  a  node.  The  advantage  of  the  slicewise  model 
is  the  enhanced  efficiency  in  utilizing  the  floating-point  units  derived  from  converting 
memory-to-memory  operations  in  the  fieldwise  model  to  register-to-register  operations  in  the 
slicewise. 

The  granularity  of  the  CM-2  is  reflected  in  the  application  of  virtual  sets.  For  the  fieldwise 
model,  this  refers  to  the  formation  of  virtual  processors  (VPs)  and.  for  the  slicewise  model,  the 
abstraction  of  virtual  grids.  A  VP  is  the  segmentation  of  the  local  memory  of  each  processor, 
thus  enabling  the  CM-2  to  simulate  a  system  with  more  physical  processors.  A  virtual  grid,  in 
contrast  to  VPs,  does  not  exist  as  a  formal  object  in  CM  memory,  but  provides  a  useful  way 
for  describing  the  allocated  memory  across  processing  nodes.  The  run-time  system 
determines  allocated  memory  within  the  processing  elements  and  maps  declared  array 
dimensions  onto  the  virtual  grids.  The  execution  of  instructions  by  the  virtual  sets  is 
performed  by  time-slicing  the  physical  processing  units. 

The  CM-2  processing  units  operate  in  a  Single-Instruction  Multiple-Data  (SIMD)  mode, 
meaning  all  processors  receive  the  same  instruction  stream  on  each  cycle.  Conditional 
operations  (i.e.,  masks)  permit  any  subset  of  ihe  processors  to  be  deselected  such  that  the 
instruction  will  only  be  performed  by  those  processors  in  the  selected  set.  The  instruction 
stream  is  broadcast  by  sequencers  which  are  controlled  by  a  conventional  rront-end  machine. 
The  front-end  machine  supports  the  operating  and  programming  environment.  Current 
languages  supported  include  CM-Fortran,  C*,  ‘Lisp,  and  Paris. 

Interprocessor  communication  is  carried  out  using  two  mechanisms  referred  to  as  the 
NEWS  (North-East-West-South)  grid  and  router.  The  addressing  of  a  VP  is  based  on  a  Gray 
coded  grid  which  provides  an  n-bit  cube  address  (where  n  <  16)  for  specifying  the  location  of 
the  processor  on  an  n-dimensional  hypercube.  The  NEWS  addressing  scheme  allows 
processors  to  pass  data  according  to  a  structured  rectangular  grid.  The  router,  on  the  other 
hand,  is  the  more  general  mechanism  which  allows  any  VP  to  communicate  with  any  other  VP 
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on  the  hypercube.  In  addition,  the  router  allows  the  local  memories  of  the  processors  to  be 
treated  as  a  single,  large  shared  memory.  The  explication  of  the  NEWS  grid  and  router  for  a 
given  problem  depends  on  the  data  pattern  which  may  vary  as  a  function  of  time.  NEWS 
communication  is  the  most  efficient.  As  a  result,  an  explicit  finite  difference  scheme,  such  as 
HULL,  maps  efficiently  onto  the  CM  since  communications  are  mostly  nearest-neighbor 
(NEWS). 

3.  THE  HULL  EULERIAN  HYDROCODE 

The  HULL  code  (Matuska  and  Osborne  1987)  is  a  multidimensional,  multimaterial  Eulerian 
wave-propagation  code  that  numerically  solves  the  partial  differential  equations  of  continuum 
mechanics.  Explicit  terms  for  heat  conduction  and  viscous  effects  are  not  included.  The 
equations  solved  in  axisymmetric  cylindrical  coordinates  for  2-D  are: 


^  0. 

L  ^y\ 

(1.1) 

4  3  TV  3  TV„  / ^ 

pO- _ -- _ fJl  +  _  =  0, 

xdx  dy  X 

(1.2) 

.  dT  37 
xdx  3y 

(1.3) 

p  ^  ^  [X  ( u  7„  .  v7,;]  -  ( u  7,,  +  v7^)  =  -  p  vg , 

(1.4) 

where  p  is  the  material  density,  x  and  y  are  the  radial  and  axial  coordinates,  respectively,  u 
and  V  are  the  corresponding  radial  and  axial  velocity  components,  T  is  the  stress  tensor,  E  is 
the  total  specific  energy,  and  g  is  the  gravitational  body  force. 

Equations  1.1  through  1.4  are  solved  on  a  finite-difference,  rectangular  mesh  composed  of 
discrete  spatial  intervals,  Ax^Ay^,  in  the  radial  and  axial  coordinates.  The  solution  is 

advanced  explicitly  from  the  initial  conditions  by  discrete  time  steps.  At",  and  is  defined  on 
the  mesh  (x,,  y^,  t")  where  each  of  the  state  variables  ^(x,y,  f)  in  the  solution  space  is  defined 

by  ^:j-l[x„y,t"). 
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State  variables  are  defined  at  the  geometric  center  of  each  ceil.  Cell  boundary  values  are 
interpolated  through  one  computational  cycle  via  cell-centered  values  from  nearest-neighbor 
cells.  These  boundary  values  are  then  advanced  ttirough  one-half  time  step  using  cell-center 
to  cell-center  gradients.  This  step  is  then  followed  by  a  full-time  step  using  half-time 
advanced  cell-boundary  gradients.  Lagrangian  conservation  (Equations  1.1-1 .4)  is  utilized  in 
this  time  update.  To  maintain  the  original  Eulerian  mesh,  material  is  advected  from  one  cell  to 
another  via  a  first-order  donor  cell  algorithm  with  a  heuristic,  multimaterial  diffusion  limiter  to 
preserve  material  interfaces. 

Material  models  in  HULL  include  elastic-perfectly  plastic  with  von  Mises  yield  criterion  as 
well  as  temperature  and  work  hardening  effects.  The  Mie-Gruneisen  equation  of  state  (EOS) 
is  used  to  model  solids  and  liquids,  and  the  gamma  law  is  used  to  model  gases.  Explosives 
are  modeled  via  the  Jones-Wilkins-Lee  EOS.  Material  failure  models  include  maximum 
principal  stress,  maximum  principal  strain,  and  the  Hancock-Mackenzie  triaxial  failure  model. 

4.  PARALLEL  IMPLEMENTATION  OF  HULL 

Implementation  complexity  of  adapting  the  HULL  code  to  a  parallel  platform  depends  on 
several  factors— namely,  the  degree  of  parallelism,  granularity  and  scalability,  interprocessor 
communication,  and  I/O  demands.  To  achieve  high  performance,  efficient  data  parallelism 
must  be  constructed  which  maximizes  processor  load  and  streamlines  interprocessor 
communication. 

4.1  CM-2  Data  Structure.  The  algorithmic  framework  for  mapping  the  HULL  data 
structure  onto  the  CM-2  architecture  lies  in  the  utilization  of  both  the  canonical  layout  of  arrays 
and  the  use  of  the  compiler  array  directive  LAYOUT  (Thinking  Machine  Corporation  1989). 

Hydrodynamic  variable  arrays  for  pressure,  velocity,  stresses,  and  strains  are  canonically 
allocated  one  element  per  VP’  with  each  conformable  array  being  placed  in  the  same  virtual 
set.  Conformable  arrays  have  the  same  shape.  Array  dimensions  are  defined  in  2-D  as 


Formally  a  distinction  should  be  made  between  fieldwise  and  slicewise  mapping  of  arrays.  For  details,  see 
Thinking  Machines  Corporation  (1991). 
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{0:nx,  0:ny),  where  nx  and  ny  are  the  number  of  hydrodynamic  computational  cells  in  the 
X  and  y  spatial  directions,  respectively.  Each  array  is  buffered  with  fictitious  cells  (Figure  1) 
containing  the  appropriate  boundary  conditions.  Boundary  conditions  accounted  for  include 
both  transmissive  and  reflective. 

Fictitious 'ceils  are  Incorporated  into  the  mesh  to  perform  uniform  computations  on  all 
active  cells  at  all  times  independent  of  whether  the  cells  are  internal  or  boundary  cells.  This 
approach  maximizes  processor  utilization  during  a  clock  cycle  for  the  Lagrangian  and 
advection  computations,  thereby  decreasing  the  overall  computational  grind  time.  The 
boundary  conditions  for  the  top  and  right  are  carried  out  in  parallel  while  the  densities  of  the 
fictitious  cells  are  being  numerically  updated. 

All  grid  axes  for  the  hydrodynamic  variable  arrays  are  NEWS-ordered  (Figure  2). 

Elemental  operations  between  the  arrays  in  a  virtual  set  require  no  interprocessor 
communication  and  dimensional  shifts  on  cells,  as  required  in  finite-difference  schemes,  are 
performed  with  NEWS  communication. 

The  compiler  directive  LAYOUT  allows  the  programmer  to  specify  the  axis  ordering  and 
weights  of  the  virtual  set  in  which  an  array  is  allocated.  An  important  application  of  LAYOUT 
is  for  arrays  with  mixed  data-parallel  (NEWS-ordered)  dimensions  and  serial  dimensions.  An 
example  is  the  mass  array  shown  in  Figure  3.  Elements  are  given  by  xm(  :SERIAL,  :NEWS, 
:NEWS  ),  where  the  SERIAL  dimensions  span  the  number  of  materials  (denoted  by  nm)  and 
NEWS  the  mesh  space.  Computations  over  the  serial  dimensions  are  performed  via  the 
front-end  computer  whereas  the  data-parallel  dimensions  are  performed  on  the  CM-2.  Similar 
mixed  arrays  are  constructed  for  material  volumes  and  energies.  Each  mixed  array  can  be 
viewed  as  an  indexed  collection  (i.e.,  a  material  slice)  of  data-parallel  arrays. 

4.2  Lagrangian  Computations.  The  cornerstone  in  reprogramming  the  Lagrangian  step  for 
SIMD  operations  lies  in  the  functionality  of  the  NEWS  communication.  Finite-difference 
schemes  are  implemented  via  the  application  of  intrinsic  shift  functions  performed  on 
data-parallel  arrays. 
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Figure  3.  Data-Parallel  Material-Indexed  Hydrodynamic  Variable  Arrays. 

As  an  example,  the  finite-difference  representation  for  the  u-component  of  velocity 
computed  at  the  cell  boundary  /  +  1/2  at  time  f  =  f*  is  given  by; 

•  Serial  Lagrangian 


u, 


n  _  P<-17  . 


n  n 

Pw  +  Pz-ij 


(2) 


•  Data-Parallel  Lagrangian 


, , "  _  p"u "  +  cshift{p"u  '’,2,1)  , 

p"-.  05/7/%",  2,1) 

The  key  point  is  the  replacement  of  sequential  operations  on  array  elements  Pg,  uj  with 
the  global  uniform  operation  on  data-parallel  arrays  p",  u".  The  circular  shift,  cshift  (p",2,1). 
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has  the  effect  of  shifting  the  data-parallel  array  p**  to  the  left  by  one  position.  This  operation  is 
one  of  the  most  efficient  CM-Fortran  operations  due  to  the  direct  mapping  onto  the  NEWS 
communication  grid.  (A  caveat  is  that  the  grid  dimensions  must  be  a  power  of  2  for  fieidwise 
and  multiples  of  4  for  slicewise.) 

The  data-parallel  solution  for  the  Lagrangian  Equations  1 .2-1 .4  with  the  assumption 
Af= Af"  is  given  by: 


^u"-  — 
P^L 


Sn  o  ^ 
XX  ^yy 


(4.1) 


_  Af 
P"L 


(4.2) 


p" 


(S^-P”) 


(4.3) 


where  and  for  (p  =  x.y;  X  =  x,y)  are  data-parallel  arrays  for  pressure  and  stress 
deviator,  respectively,  5*-  is  the  spatial  derivative 

5"^"  =  (^1/2  -  cshift(^"a.dim,-^  ))/AX 

with  defined  as  the  spatial-centered  term,  dim  =  1,2  depending  on  if  X  =  x  or  y,  and 
AX  =  cshift  (X,dim,1)  -  X. 

Data-parallel  expressions  for  are  given  by 
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(5) 


nn*M2  _  nf! 

Ma  -  “■\a  ~ {P^sh/2 


1 


5' (XU)” 


ia 


for  the  radial  direction  and 


n/l^ia 

“ ia 


(pC|}?«S^(v?;.) 


(6) 


for  the  axial  direction.  The  spatial-centered  pressures  of  Equations  5  and  6  are  defined  by 


pn  cshift{P'',dim^)p"  +  P''cshift{p'',dim,^) 
p"  +  cshift{p'’.dim,^) 


with  dim  depending  on  either  the  radial  or  axial  direction.  The  (pC|)”,2  term  in  Equations  5 
and  6,  where  is  the  isentropic  sound  speed,  is  given  by 


{pCl)U  =  min[(pC|)'’.  cshmpCir,  dim,  1)] 


with  (pel)"  computed  via  the  EOS. 

Data-parallel  time  advanced  velocities  in  Equation  4.3  are  computed  via  the  following 
expressions: 


Vl/2 


2max(p".cs/7/7if(p",2.1 )) 


2max(p",csh/ff(p",1 ,1)) 


\^cshift(P",2,^)-Py^x, 

{cshift(P\^.^)-Py^y 


-i^(p  +  csh/yf(g,1,1)). 
4 


where  u"^  is  given  by  Equation  3  and  has  an  analogous  form. 
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Similar  computations  are  carried  out  for  the  stress  deviators.  The  numerical  solution  in  a 
data-parallel  format  is  obtained  explicitly  by 


" 


^  csh/fl(p'',d/m.1 )  +  cshift{S^,dim.^  )p" 
p"  +  cs/7/%",d/m,1 ) 


where 


<D,^  =  min( VF,  cshift{VF.  dim,  1 )) 

with  VF  defined  as  a  data-parailei  array  describing  the  fractional  volume  of  solid  in  a  given 
computational  cell.  The  stress  deviators  are  numerically  updated  and  are  subject  to  the 
Von  Mises  yield  criterion. 

The  application  of  the  boundary  conditions  for  the  Lagrangian  and  advection  computations 
is  implemented  through  the  use  of  data-parallel  selector  arrays  containing  values  of  1 .0  for 
selecting  computational  cells  and  values  of  0.0  for  deselecting  cells.  Selector  arrays,  or 
masked  array  assignments,  are  implemented  using  the  WHERE  statement.  For  example,  the 
left  reflective  boundary  condition  for  given  by  Equation  3  requires  the  left  fictitious  cells  to 
hold  the  temporary  value  of  =  0.0.  This  is  accomplished  by  multiplying  the  data-parallel 
expression  for  u"^  by  an  array  containing  1 .0  for  all  active  cells  and  0.0  for  the  left  fictitious 
cells.  Similar  selector  arrays  are  employed  for  implementing  analogous  boundary  conditions. 

4.3  EOS  Computations.  EOS  calculations  are,  in  general,  good  candidates  for  the  SIMD 
data  parallelism  of  the  CM-2.  They  are  characterized  as  being  free  of  both  interprocessor 
communication  and  grid  boundary  conditions.  However,  for  multimaterial  problems,  EOS 
calculations  are  inherently  Multiple- Instructions  Multiple-Data  (MIMD)  type  operations.  The 
MIMD  nature  is  due  to  the  nonhomogeneity  of  the  computations  derived  from  materials  with 
different  EOS  formulations  (e.g.,  gamma  law  and  Mie-Gruneisen)  and  different  material 
parameters  characterizing  the  same  EOS  (e.g.,  steel  and  rolled  homogeneous  armor  [RHA]). 
Moreover,  mixed  material  cells,  which  require  an  iterative  procedure  to  equilibrate  the 
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pressure  for  each  material,  induce  a  MIMD  style  of  programming.  Figure  4  depicts 
schematically  the  general  condition  for  computing  tfie  EOS  for  a  three-material  simulation. 

The  most  direct  method  for  computing  pressures  employing  analytic  EOS  expressions  of 
the  form  p  =  pfp,/j,  where  /  is  the  internal  energy,  is  one  which  calculates  in  parallel  cell 
pressures  (partial  pressures  for  mixed  material  cells)  as  part  of  a  sequential  loop  over  all 
materials.  The  calculated  result  is  placed  in  a  data-parallel  scratch  array  pp(im,:.:),  where  im 
is  the  material  index.  A  logical  mask  is  then  used  to  segregate  pure  and  mixed  cells,  with 
mixed  cells  requiring  further  calculations. 

The  problem  with  this  method  is  twofold.  First,  there  is  a  nm-factor  increase  in  the  set  of 
required  computations  due  to  the  sequential  loop  over  the  materials  rather  than  one 
data-parallel  SIMD  computation.  This  can  be  somewhat  relaxed  for  materials  with  identical 
EOS  formulations  by  introducing  data-parallel  material  property  arrays  for  each  material  at 
each  VP  (or  virtual  grid).  For  virtual  sets  with  identical  materials,  one  array  would  be  required. 
Unfortunately  this  determination  is  dynamic  and  not  static. 

The  second  problem  deals  with  mixed  material  cells.  Each  mixed  cell  undergoes  a  volume 
iteration  in  an  effort  to  compute  an  equilibrium  pressure.  During  this  iteration,  the  VPs  (or 
virtual  grids)  which  hold  pure  cells  are  conditionally  masked  such  that  they  are  inactive.  As 
the  number  of  iterations  and  mixed  cells  grows,  the  relative  cycle  throughput  of  SIMD 
operations  decreases.  Similar  problems  occur  during  the  advection  phase.  The  elimination  of 
these  problems  requires  asynchronous  constructs  and  are  not  supported  in  a  SIMD  platform. 

The  SIMD  methodology  for  computing  material  strength  is  similar  to  that  for  computing 
pressures.  Scratch  data-parailel  arrays  are  employed  to  store  temporary  values  of  the  shear 
modulus,  yield  strength,  stress  deviators,  etc.  for  both  pure  and  mixed  material  cells  during 
volume  iterations.  Upon  convergence,  ali  cell  values  are  reloaded  into  their  respective 
hydrodynamic  variable  arrays. 

4.4  Advection  Computations.  As  mentioned  above,  HULL  advects  materials  based  on  a 
first-order  donor  cell  method.  The  calculation  of  the  relative  transport  weights  for  apportioning 
the  volume  flux  is  carried  out  using  the  intrinsic  cshift  function  for  computing  the  fractional 
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Figure  4.  HULL  EOS  Computations. 

volumes  in  the  receiver  and  upstream  cells.  A  diffusion  limiter  algorithm  is  employed  in  an 
attempt  to  unmix  mixed  material  cells. 

The  material  slices  for  computing  transport  terms  are  stored  in  a  data-parallel  array 
hs(  :SERIAL,  :SERIAL,  :NEWS.  :NEWS),  where  the  SERIAL  dimensions  cover  the  number  of 
materials  and  spatial  flux  directions  (four  in  2-D),  respectively.  The  NEWS-ordered 
dimensions  span  the  mesh  space  and  are  conformable  with  the  advected  hydrodynamic 
variable  arrays.  Volume  iterations  are  required  to  reduce  the  flux  of  over-emptied  materials. 
Convergence  is  checked  by  monitoring  a  data-parallel  array  consisting  of  ones  and  zeros. 

The  final  remapping  step  is  transparent  in  its  implementation,  using  simple  grid 
finite-difference  quantities  computed  via  cshift  operations.  For  example,  the  volume  of 
material  n,  denoted  by  the  data-parallel  array  V„,  is  advected  to  the  original  fixed  Eulerian 
mesh, 

=  V„  {left)  +  5V„^{bottom)  -  hV„,{right)  -  bV^^(above) 


=  V„  +  cshift{bV„„2,-^)  ^ 
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where  is  the  Eulerian  volume  and  the  transporting  volume  is 


with  A^,  defined  as  the  transport  fraction  for  each  material  in  particular  direction  I.  Active  cells 
are  advected  while  fictitious  cells  along  with  inactive  cells  are  masked. 

5.  APPLICATION  AND  PERFORMANCE  RESULTS 

The  application  we  report  here  as  an  illustration  of  the  computational  performance  is  a 
2-D.  multimaterial  computation  of  a  steel  rod  impacting  RHA  at  a  striking  velocity  of  3  km/s 
(Figure  5).  The  computational  geometry  is  such  that  the  length-to-diameter  ratio  of  the  steel 
rod  was  set  to  5.  Material  strength  was  implemented  via  an  elastic-perfectly  plastic 
formulation  with  the  hydrodynamic  behavior  of  materials  modeled  using  the  gamma  law  and 
Mie-Gruneisen  EOS. 

Calculations  were  performed  on  a  16K  segment  of  a  32K-processor  CM-2  located  at  the 
University  of  Minnesota.  The  total  memory  capacity  is  4  Gbytes  with  a  DataVault  of 
10  Gbytes.  The  front-end  is  a  VAX  6420  with  64  Mbytes  of  memory  running  the  ULTRIX 
operating  system.  Reprogramming  of  the  HULL  code  was  carried  out  using  CM-Fortran  with 
double-precision  arithmetic  implemented  via  the  slicewise  compiler. 

Results  for  the  grind  times  (microsec/cell/cycie)  computed  on  the  CM-2  for  various  mesh 
sizes  along  with  the  corresponding  CRAY-2  single  processor  results  are  presented  in  Table  1. 
Note  that  all  meshes  were  a  multiple  of  4,  thereby  optimizing  the  layout  of  data  on  the  CM.  It 
should  be  noted  that  meshes  that  are  not  a  multiple  of  4  will  yield  lower  performance.  In 
general,  performance  on  the  CRAY-2  is  roughly  constant  (i.e.  independent  of  problem  size). 

A  comparison  of  the  computed  grind  times  shows  the  1 6K-processor  CM-2  performance  is 
faster  than  a  CRAY-2  processor.  For  a  512  x  512  mesh  the  CM-2  is  12  times  faster.  Note 
that  the  grind  times  for  a  fixed  CM-2  scale  inversely  and  nonlinearly  with  the  virtual  grid 
length. 
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Table  1 .  HULL  Hydrocode  Performance  Results  on  the  CM-2® 


Grid  Size 

CM-2 

CRAY-2 

VG  Length^ 

Efficiency'' 

Grind  Time‘s 

Grind  Time‘s 

128  X 128 

32 

0.87 

39 

256  X  256 

128 

0.93 

22 

— 

512  X  512 

512 

0.97 

16 

196 

^CM-Fortran  with  double  precisbn  using  slicewise  compiler  on  a  16K  segment. 
vG  (virtual  grid)  length  »  number  of  grid  points/number  of  FPUs. 

^Efficiency  -  CM-2  execution  time/CM-2  elapsed  time. 

°Grind  time  >  p.s/cell/cycle. 


The  observed  improvement  in  efficiency  as  a  function  of  data  set  size  is  due  to  the 
amortization  of  the  start-up  overhead  over  large  blocks  of  computations  and  to  some  of  the 
communication  occurring  on  the  same  chip.  The  overall  SIMD  parallelism  performance  of  the 
HULL  code  is  limited  by  the  EOS  solution  procedure  employed  in  solving  for  mixed  cells. 


Recently  developed  EOS  methods  (McGlaun,  Thompson,  and  Elrick  1990)  appear  to  be  more 
amenable  to  the  data  parallelism  of  the  CM-2. 

6.  CONCLUSIONS 

In  this  report,  we  have  presented  the  initial  step  toward  the  adaptation  of  the  HULL  code 
for  the  Connection  Machine.  Results  for  a  parallel  implementation  of  a  prototypical  rod-plate 
impact  calculation  have  been  shown  to  be  faster  than  the  CRAY-2  results.  Extrapolating  the 
CM-2  grind  times  to  a  full  64K-processor  machine  suggests  that  this  machine  is  capable  of  50 
times  the  performance  of  the  CRAY-2  for  executing  the  HULL  code.  However,  performance  is 
limited  by  the  EOS  calculation  for  the  multimaterial  mixed  cells. 
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